MP968 Experimental Design Workshop

Leighton Pritchard

University of Strathclyde

2025-11-24

Why do we need experimental design?

We should not cause unnecessary suffering

We should always minimise suffering

This may mean not performing an experiment at all. Not all new knowledge or understanding is worth causing suffering to obtain it.

Where there is sufficient justification to perform an experiment, we are ethically obliged to minimise the amount of distress or suffering that is caused, by designing the experiment to achieve this.

Why we need statistics

It may be easy to tell whether an animal is well-treated, or whether an experiment is necessary.

But what is an acceptable (i.e. the least possible) amount of suffering necessary to obtain an informative result?

Challenge

Quiz question

Suppose you are running a necessary and useful experiment with animal subjects, where the use of animals is morally justified. You are comparing a treatment group to a control group. Which of the following choices will cause the least amount of suffering?

  • Use three subjects per group so a standard deviation can be calculated
  • Use just enough subjects to establish that the outcome is likely to be correct
  • Use just enough subjects to be certain that the outcome is correct
  • Use as many subjects as you have available, to avoid wastage

How many individuals?

The appropriate number of subjects

The appropriate number of animal subjects to use in an experiment is always the smallest number that - given reasonable assumptions - will satisfactorily give the correct result to the desired level of certainty.

  • What assumptions are reasonable?
  • What is an appropriate level of certainty?

By convention1 the usual level of certainty for a hypothesis test is: “we have an 80% chance of getting the correct true/false answer for the hypothesis being tested”

Design experiments to minimise suffering

Experimental design and statistics are intertwined

Once a research hypothesis has been devised:

  • Experimental design is the process of devising a practical way of answering the question
  • Statistics informs the choices of variables, controls, numbers of individuals and groups, and the appropriate analysis of results

Design your experiment for…

  • your population or subject group (e.g. sex, age, prior history, etc.)
  • your intervention (e.g. drug treatment)
  • your contrast or comparison between groups (e.g. lung capacity, drug concentration, etc.)
  • your outcome (i.e. is there a measurable or clinically relevant effect)

The 2009 NC3Rs systematic survey

The importance of experimental design

“For scientific, ethical and economic reasons, experiments involving animals should be appropriately designed, correctly analysed and transparently reported. This increases the scientific validity of the results, and maximises the knowledge gained from each experiment. A minimum amount of relevant information must be included in scientific publications to ensure that the methods and results of a study can be reviewed, analysed and repeated. Omitting essential information can raise scientific and ethical concerns.” (Kilkenny et al. (2009))

We rely on the reporting of the experiment to know if it was appropriate

Causes for concern 1

“Detailed information was collected from 271 publications, about the objective or hypothesis of the study, the number, sex, age and/or weight of animals used, and experimental and statistical methods. Only 59% of the studies stated the hypothesis or objective of the study and the number and characteristics of the animals used. […] Most of the papers surveyed did not use randomisation (87%) or blinding (86%), to reduce bias in animal selection and outcome assessment. Only 70% of the publications that used statistical methods described their methods and presented the results with a measure of error or variability.” (Kilkenny et al. (2009))

We cannot rely on the literature for good examples of experimental design

Causes for concern 2

No publication explained their choice for the number of animals used

We cannot rely on the verbal authority of ‘published scientists’ or ‘experienced scientists’

Very strong cause for concern

Power analysis or other very simple calculations, which are widely used in human clinical trials and are often expected by regulatory authorities in some animal studies, can help to determine an appropriate number of animals to use in an experiment in order to detect a biologically important effect if there is one. This is a scientifically robust and efficient way of determining animal numbers and may ultimately help to prevent animals being used unnecessarily. Many of the studies that did report the number of animals used reported the numbers inconsistently between the methods and results sections. The reason for this is unclear, but this does pose a significant problem when analysing, interpreting and repeating the results.” (Kilkenny et al. (2009))

Important

As scientists, you - yourselves - need to understand the principles behind the statistical tests you use, in order to choose appropriate tests and methods, and to use appropriate measures to minimise animal suffering and obtain meaningful results.

You cannot simply rely on the word of “experienced scientists” for this.

The ARRIVE guidelines

The next year (Kilkenny et al. (2010)) proposed the ARRIVE guidelines: a checklist to help researchers report their animal research transparently and reproducibly.

  • Good reporting is essential for peer review and to inform future research
  • Reporting guidelines measurably improve reporting quality
  • Improved reporting maximises the output of published research

ARRIVE guidelines highlightes

Many journals now routinely request information in the ARRIVE framework, often as electronic supplementary information. The framework covers 20 items including the following (Kilkenny et al. (2010)):

ARRIVE guidelines (highlights)

    1. Objectives: primary and any secondary objectives of the study, or specific hypotheses being tested
    1. Study design: brief details of the study design, including the number of experimental and control groups, any steps taken to minimise the effects of subjective bias, and the experimental unit
    1. Sample size: the total number of animals used in each experiment and the number of animals in each experimental group; how the number of animals was decided
    1. Statistical methods: details of the statistical methods used for each analysis; methods used to assess whether the data met the assumptions of the statistical approach
    1. Outcomes and estimation: results for each analysis carried out, with a measure of precision (e.g., standard error or confidence interval).

A vital step

Warning

“A key step in tackling these issues is to ensure that the next generation of scientists are aware of what makes for good practice in experimental design and animal research, and that they are not led into poor or inappropriate practices by more senior scientists without a proper grasp of these issues.”

Recommended reading

Bate and Clark (2014)

Some Statistical Concepts

Probability distributions

The probability distribution of a random variable \(z\) (e.g. the values you measure in an experiment) takes on some range of values

The mean of the distribution of \(z\)

  • The mean (aka expected value or expectation) is the average of all the values in \(z\)
    • Equivalently: the mean is the value that is obtained on average from a random sample from the distribution
  • Written as \(\mu_{z}\) or \(E(z)\)

The variance of a distribution of \(z\)

  • The variance of the distribution of \(z\) represents the expected mean squared difference from the mean \(\mu_z\) (or \(E(z)\)) of a random sample from the distribution.
    • \(\textrm{variance} = E((z - \mu_z)^2)\)

Understanding variance

A distribution where all values of \(z\) are the same

  • Every single value in the distribution (\(z\)) is also the mean value (\(\mu_z\)), therefore

\[z - \mu_z = 0 \implies (z - \mu_z)^2 = 0\] \[\textrm{variance} = E((z - \mu_z)^2) = E(0^2) = 0\]

All other distributions

In every other distribution, there are some values of \(z\) that differ so, for at least some values of \(z\)

\[z - \mu_z \neq 0 \implies (z - \mu_z)^2 \gt 0 \] \[\implies \textrm{variance} = E((z - \mu_z)^2) \gt 0 \]

Standard deviation

What is standard deviation?

The standard deviation is the square root of the variance

\[\textrm{standard deviation} = \sigma_z = \sqrt{\textrm{variance}} = \sqrt{E((z - \mu_z)^2)} \]

Advantages

  • The standard deviation (unlike variance) is on the same scale as the original distribution
    • Standard deviation is a more “natural-seeming” interpretation of variation

Note

We can calculate mean, variance, and standard deviation for any probability distribution.

Normal Distribution 1

\[ z \sim \textrm{normal}(\mu_z, \sigma_z) \]

Note

We only need to know the mean and standard deviation to define a unique normal distribution

Tip

Measurements of variables whose value is the sum of many small, independent, additive factors may follow a normal distribution

Important

There is no reason to expect that a random variable representing direct measurements in the world will be normally distributed!

Normal Distribution 2

Normal Distribution 3

Binomial Distribution 1

Suppose you’re taking shots in basketball

  • how many shots?
  • how likely are you to score?
  • what is the distribution of the number of successful shots?

Tip

This kind of process generates a random variable approximating a probability distribution called a binomial distribution.

It is different from a normal distribution.

Binomial Distribution 2

\[ z \sim \textrm{binomial}(n, p) \]

Tip

  • number of shots, \(n = 20\)
  • probability of scoring, \(p = 0.3\)

\[z \sim \textrm{binomial}(20, 0.3) \]

mean and sd

\[ \textrm{mean} = n \times p\] \[ \textrm{sd} = \sqrt{n \times p \times (1-p)}\]

Poisson distribution

\[z \sim \textrm{poisson}(\lambda)\]

Poisson distribution

  • Used for count/rate data, e.g.
    • the number of cases of cancer in a county
    • the number of [Ca2+] spikes in a given time period

Expectation (\(\lambda\))

  • Only one parameter is provided, \(\lambda\): the rate with which the measured event happens

  • Suppose a county has population 100,000, and average rate of cancer is 45.2mn people each year

\[z \sim \textrm{poisson}(45,200,000/100,000) = \textrm{poisson}(4.52) \]

Binomial and Poisson distributions

All values are positive whole numbers

Distributions in Practice

Distributions are starting points

  • Distributions arise from and represent distinct generation processes
    • Normal distributions are good for sums, differences, and averages
    • Poisson distributions are good for counts
    • Binomial distributions are good for success/failure outcomes

Warning

  • All statistical distributions are idealisations that ignore many features of real data
  • No real world data should be expected to exactly match any statistical distribution
  • Poisson models tend to need adjustment for overdispersion

Normal Distribution Redux

Probability mass

  • approximately 50% of the distribution lies in the range \(\mu \pm 0.68\sigma\)
  • approximately 68% of the distribution lies in the range \(\mu \pm \sigma\)
  • approximately 95% of the distribution lies in the range \(\mu \pm 2\sigma\)
  • approximately 99.7% of the distribution lies in the range \(\mu \pm 3\sigma\)

References

References

Bate, Simon T., and Robin A. Clark. 2014. The Design and Statistical Analysis of Animal Experiments. Cambridge University Press.
Kilkenny, Carol, William J Browne, Innes C Cuthill, Michael Emerson, and Douglas G Altman. 2010. “Improving Bioscience Research Reporting: The ARRIVE Guidelines for Reporting Animal Research.” PLoS Biol. 8 (6): e1000412.
Kilkenny, Carol, Nick Parsons, Ed Kadyszewski, Michael F W Festing, Innes C Cuthill, Derek Fry, Jane Hutton, and Douglas G Altman. 2009. “Survey of the Quality of Experimental Design, Statistical Analysis and Reporting of Research Using Animals.” PLoS One 4 (11): e7824.